This is a walkthrough demonstrating how to generate SWNE plots alongside the Seurat manifold alignment pipeline from three pancreas datasets generated using different single cell RNA-seq technologies.

To save time we will be using the pre-computed Seurat object pancreas_integrated_seurat.Robj, which can be downloaded here.

First let’s load the required libraries

library(Seurat)
library(swne)

Let’s load the Seurat object

se.obj <- readRDS("~/swne/Data/multiple_pancreas_seurat_v3.RObj")

We can extract the integrated expression matrix

aligned.counts <- as.matrix(GetAssayData(se.obj, assay = "integrated"))

Note that this matrix has negative values, which we will have to rescale in order to run NMF

dim(aligned.counts)
## [1] 2000 5683
hist(as.matrix(aligned.counts))

aligned.counts <- t(apply(aligned.counts, 1, function(x) (x - min(x))/(max(x) - min(x))))

Once we have the “reconstructed matrix”, we can run NMF and embed the components

k <- 20
n.cores <- 8

nmf.res <- RunNMF(aligned.counts, k = k, alpha = 0, init = "ica", n.cores = n.cores)

pc.emb <- Embeddings(se.obj, "pca")
snn <- CalcSNN(t(pc.emb), k = 20, prune.SNN = 0)

swne.embedding <- EmbedSWNE(nmf.res$H, snn, alpha.exp = 1.5, snn.exp = 1, 
                            n_pull = 3)
## Initial stress        : 0.29636
## stress after  10 iters: 0.16434, magic = 0.004
## stress after  20 iters: 0.14331, magic = 0.043
## stress after  30 iters: 0.09593, magic = 0.500
## stress after  40 iters: 0.09127, magic = 0.500
## stress after  50 iters: 0.09041, magic = 0.500
## stress after  60 iters: 0.09006, magic = 0.500
## stress after  70 iters: 0.08971, magic = 0.500
## stress after  80 iters: 0.08968, magic = 0.500

We project the full gene expression matrix to get gene loadings for the full set of genes

norm.counts <- ExtractNormCounts(se.obj, rescale = T, rescale.method = "log", batch = NULL)
## calculating variance fit ... using gam
nmf.res$W <- ProjectFeatures(norm.counts, nmf.res$H, n.cores = n.cores)

We use these gene loadings to identify key genes to embed, and we hide all the factors

gene.factor.df <- SummarizeAssocFeatures(nmf.res$W, features.return = 1)
genes.embed <- unique(gene.factor.df$feature)

swne.embedding <- EmbedFeatures(swne.embedding, nmf.res$W, genes.embed, n_pull = 3)
swne.embedding$H.coords$name <- ""

We pull out the clusters and batches

clusters <- se.obj$celltype
batch <- se.obj$tech

We can then create the SWNE plot

PlotSWNE(swne.embedding, alpha.plot = 0.4, sample.groups = clusters, do.label = T, 
         label.size = 3.5, pt.size = 1.25, show.legend = F, seed = 42)

We also can show that there are no batch effects

PlotSWNE(swne.embedding, alpha.plot = 0.4, sample.groups = batch, do.label = F, 
         label.size = 3.5, pt.size = 0.25, show.legend = T, seed = 42)

UMAP plot for comparison

tsne.emb <- Embeddings(se.obj, "umap")
PlotDims(tsne.emb, sample.groups = clusters, show.legend = F, seed = 42,
         show.axes = F)